Useful tips

All of the things in this document

Difficulty setting

Charles T. Gray plays on normal tidyverse difficulty setting.

Gaming difficulty:

  • the pew-pew sound my gun makes startles me (this workshop)
  • normal
  • hardcore

What about our other presenters?

Here to learn together

We are here to learn from you, too.

The plan is to live code in these notes, enabling us to edit these notes live using the wonderful xaringan::infinite_moon_reader function, incorporating feedback.

Notes for this workshop

We will all work from a personal copy of these slides.

After some introductory waffling, and set-up, we’ll download the file together.

You will be able to annotate these notes or write extra code examples for yourself.

Indeed, there is little code provided in these notes. But there are lots of functions. We will write the code together.

Pseudo code

So that you have the experience of getting your hands dirty with code, we’ll use pseudo code.

<this is pseudo code>

install.packages("<you write code here>")

If I wanted to install a package called metafor, I would use the code

install.packages("metafor")

Notice that we drop the <>. That’s not code, it’s pseudo code to highlight where your input is.

Survival tips

Don’t be afraid to harness all tools available:

  • Modern day coding practice comprises almost entirely searching “how do I do <x> in <language>”.
  • The community is your best resource; talk to each other and make friends and future collaborators.
  • Cheatsheets (Help-Cheatsheets)

Finally, everyone needs to ragequit sometimes.

via GIPHY

Stickynote signalling

yellow

Yup, all good.

pink (the ragequit option)

Some assistance would be greatly appreciated.

Also, I think I broke the internet.

green

I did it already. Trying an extension question, want to join me?

What is R?

What do R-users use R for?

R users are data detectives.

We’re going to try to cover everything except Model.

This image is from R for Data Science, a great text to get started with. It’s available online as a free ebook.

For us to learn how to be data detectives, we’ll need some data.

A fitting dataset

A fitting dataset

To familiarise ourselves with R, we will do what R users do.

We will explore a dataset.

A fitting dataset.

A few images to explain why a dataset about witch trials might be appropriate for a workshop hosted by an advocacy group for underrepresented genders.

Image search for “witch” + “michelle wolf”

Image search for “witch” + “julia gillard”

Image search for “witch” + “hillary clinton”

A fitting dataset

The inspiration for this workshop was an analysis by Steph de Silva, of useR! 2018 keynote renown.

Steph, please tell us about the paper that generated this dataset.

Steph delights all and sundry.

Thanks, Steph!

Questions, questions that need answering

What would you like to know?

Group discussion

What would you like to know?

Write questions on whiteboard

How to answer these questions?

For this workshop:

Why these tools?

  • R was intentionally developed to be a data analysis language (aeroplane)
  • RStudio is designed to help users use R (airport)

“The plane is pretty boring without the airport around it.”

(Tip of the hat to Julia Lowndes for the aeroplane analogy.)

Installation hell

Installation overview

  • install R
  • install RStudio

Installation instructions adapted with appreciation from a previous workshop.

Install R on Mac or Windows

Go to the Comprehensive R Archive Network(CRAN) website.

It was first in a google search for ‘cran’ in June 2018.


For Mac users

  • Click on Download R for (Mac) OS X
  • Look at the top link under Latest release, which at time of writing is R-3.5.0.pkg, and download this if compatible with your current version mac OS (Mavericks 10.9 or higher). Otherwise download the version beneath it which is compatible for older mac OS versions.

For Windows users

  • Click on Download R for Windows
  • Then click on the link install R for the first time
  • Download from the large link at the top of the page which at time of writing is Download R 3.5.0 for Windows.

Install R

  • Then double click the downloaded R-3.5.0.pkg file and follow the prompts to install the downloaded software.

Download & Install RStudio:

Go to the RStudio website.

It was first in a google search for ‘rstudio’ in June 2018.

Choose RStudio and scroll down to the blue Download RStudio Desktop button.

Click the green button to download RStudio Desktop Open Source License and select appropriate installer for your operating system.

Double click the installer and follow the prompts to set up RStudio.

Why RStudio?

Working in an RStudio project has many benefits.

  • Free, open source software R in an IDE
  • Reproducible workflows

Getting started in RStudio

Create a project

  • Open RStudio and create a project via File-New Project
  • Select New Directory and choose New Project
  • Name your project rcurious
  • Save the project directory wherever suits you

Argh, so many windows

R-Ladies presenters gesticulate wildly at RStudio

Let’s start with a couple useful panes.

  • Console
  • Environment

Cheat sheet

Help-Cheatsheets-RStudio IDE Cheatsheet

How to run code?!

Running code in the Console

The console is where you can execute single-line R commands.

The console is located, by default, in the lower left pane.

Try 3 + 2 and press enter:

# Type 3 + 2 here. Here's hoping infinite moon reader works. 
3+2
## [1] 5

Environment

I can store the number 5 in an object x.

To assign a value we use an arrow <-.

x <- 5

What happens when you type x into the Console after assigning the value 5 to it?

What do you see in the Environment pane?


If I assign the value 5 to the object x, and call it in the console, it returns the value assigned.

# Assign 5 to x and call x.

There are many complex data types in R.

Data structures in R

Data objects can be made of

  • numbers
  • characters “this is a character string”
  • logicals (TRUE/FALSE)
  • list (mixed)

Or tables of combinations of the above.

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa

And much more.


The witch trials dataset is a table.

It is a tidy data structure:

  • rows are observations
  • columns are variables (attributes)

Editor

We’ll do this analysis in R markdown.

  • run code in chunks
  • write in text, LaTeX, html, outside of code chunks
  • make websites, presentations, books

Rmarkdown

Opening an .Rmd

Open File-New File-R Markdown

Follow the prompts to install the required packages.

Give your document a title and press ok.

This will open an Untitled2 template.

Save your document - note, you have given your document a title, not saved it!

Press the wheel next to the knit icon at the top of the Editor pane and select Viewer.

Now we will knit our document.

The tutorial document

A treasure trove!

Get these notes

Delete everything in your .Rmd file.

Look up softloud charles gray github

You want to navigate to: https://github.com/softloud

Click on rcurious in my Pinned repositories.

Click on workshop and then workshop.Rmd and then click the Raw button and copy and paste the text into your .Rmd file.

  • control + a to select all
  • control + c to copy
  • control + v to paste

Knit the notes

control + shift + k to knit

But it is annoying having a pop out window, so let’s view these notes in the Viewer pane.

We are going to modify the YAML at the top of the document so that you can see how to toggle a presentation into notes.

Press the little document outline button in the top right corner of the Editor pane, and navigate to the top of the document.

output :
  # ioslides_presentation 
  html_document: 
     toc: true
     toc_float: true

Knit again. (This takes a while.)

Slides and notes

From now on, we’ll refer to both the notes and the slides.

Your notes are made from the same code as the slides we’re looking at.

If you want information from a few slides ago, you can find it in your notes.

Your turn

Navigate in the Viewer pane using the floating table of contents to the image of Hillary Clinton in the section called A fitting dataset.

Navigate back to this section in your Editor pane by using the Document Outline provided by the button in the top right of the Editor.

Annotating these notes

Now you can take notes directly into the script file for these slides.

Let’s create a section at the top of the document called Useful tips.

We’ll create a heading directly above the section called Hi. (Use your Document Outline to navigate there.)

It looks likes this at the top of the script file.

## Hi! 
  • To create a heading use #. The more ###, the smaller the heading.

We can add bullet points with -. So,

- Euclid
- Beanie

renders in your output like this:

  • Euclid
  • Beanie

Cheatsheet

You can make tables, insert images and videos, and much more!

There are two cheatsheets on RMarkdown in Help-Cheatsheets.

Your turn

Ask around for a “what I wish I’d known” story about R or investigate a cheatsheet.

Find a tip that sounds useful to you and add it to your notes in your Useful tips section as a bullet point.

Knit your document and bask in the pretty!

Knitting taking too long?

Try deleting the images. You can always download a fresh copy of the notes if you want to get them back.

Packages

Packages

Packages are collections of other people’s code.

Often someone has already written a script that does what you want to do.

For example, we want to import the witch trials data. We will use a package that helps with data wrangling tasks like this, the tidyverse.

We’re going to use the metapackage tidyverse to help us with our data analysis.

Functions

The most common element of packages are functions. R also comes preloaded with a base of functions commonly used.

Functions run other people’s code for us, so that we don’t have to reinvent the wheel. We will use functions to intall and load the tidyverse.

How to spot a function

  • functions in R take the form <function()>

Getting more information

To learn more about a function, type ?function into the Console, and the Help pane will display documentation.

Installing packages

We want to install the package tidyverse.

Take a look at the help documentation for the function install.packages().

For installation; i.e., first time only.

install.packages("<name of package>")

For using.

library(<name of package>)

Pseudo code

A friendly reminder that <> indicates pseudo code. You can navigate back to that section in your notes in your Viewer.

Your turn

We would like to install and use a package called “tidyverse”.

Let’s try.

Your turn

Let’s load the tidyverse

Press the little document outline button in the top right corner of the Editor pane.

Navigate to this part of your notes and add code in the chunk that will load the tidyverse package.

# This is a code chunk. 

# We can write informative comments with a hash # at the start.

# Load the tidyverse using the library() function. 
# Press the green arrow in the top right corner of the chunk to run!

# Don't forget, you need to install the package before you can use it. 

#install.packages("tidyverse")

librarian::shelf(tidyverse)
## Warning in librarian::shelf(tidyverse): cran_repo = '@CRAN@' is not a valid URL. 
##                     Defaulting to cran_repo = 'https://cran.r-project.org'.
## ── Attaching packages ────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.7.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## Warning: package 'stringr' was built under R version 3.4.4
## ── Conflicts ───────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Importing data

Import the witch trials data

Since the data is stored on an online repository, we can import it via url.

We can import this data using the read_csv() function from the tidyverse.

This function takes one argument, the url, which goes between the () as a “character string”.

There are many types file types, which often need special care.

Presenters discuss different file types and old battle stories of importing data.

Your turn

The data is found here: “https://raw.githubusercontent.com/JakeRuss/witch-trials/master/data/trials.csv

Try importing the data at the console with read_csv. What output do you see?

read_csv with the argument url produces a data object. An object we can assign.

Open a code chunk here and read the data in using read_csv and assign <- the data to an object called witchdat

  • control+alt+i to open a code chunk.
  • press green play to run chunk.

Take a look

What do you see in your Environment?

Click on it!

Exploratory data analysis

Exploratory data analysis

Let’s explore the information in this table.

  • summary
  • wrangling and cleaning
  • visualisation

Summary functions

Lots of objects in R <an R object> are friendly to the summary(<an R object>) function.

Your turn

What’s is the output of summary for witchdat?


An alternative to summary

An alternative is the skim() function from from the skimr package.

Your turn

  • install the skimr package
  • Open a code chunk here and load the skimr package in your notes
  • apply the skim function to the witchdat data

What is the difference between the output of summary and skim? Which do you like better and why?

Group discussion

Based on this new information, what questions do we add or update?

At this point, we often wish to manipulate the data in some way.

This is variously known as wrangling, cleaning, and scrubbing.

Data wrangling

The fine art of wrangling

Change the name of a variable

This workshop is based on Steph de Silva’s wonderful analysis Witch Hunting in Europe: a discovery of missingness.

One of the first things that Steph does is change the name of the column gadm.adm0 to something more human-interpretable, country.

Steph notes, “This is a gross oversimplification of geography in the middle ages of Europe, but it describes the location of the trial in terms that will be most familiar to many modern users.”

Follow in the footsteps of greatness

Let’s use the rename() function to change the name of the variable (column) gadm.adm0 to country.

To do this, we’ll learn a very useful operator, the pipe %>%.

The pipe

Piping makes code easier to read (arguably).

For example, we saw a snapshot of the preloaded iris data earlier.

The head() function takes one argument, a table:

head(<some data table>). 

But we could also pipe %>% the data into the function.

<some data table> %>% head() 

Your turn

Use the pipe function to present the top of the witchdat dataset.

Rename a column

We can rename a column by constructing a pipe from the table to the rename function

<my data> %>% 
  rename(<newname> = <oldname>)

Your turn

Pipe witchdat to the rename function and change gadm.adm0 to country.

Visualisation

The structure of a ggplot

The tidy way of plotting data is with the ggplot2 package, which comes with the tidyverse.

<some data> %>% # pipe the data to ggplot()

Your turn

What happens when you %>% the witchdat table into ggplot()?

Aesthetics

We define x and y axes of the plot with aesthetics in ggplot.

<some data> %>% 
  ggplot(aes(x = <column name>, y = <column name>))

Your turn

What happens if you set x and y axes to column names in aes?

Your turn: my first ggplot

Let’s see how many women were murdered as witches over time.

We’ll add a plot layer + to our ggplot using geom_point for a scatterplot.

Set the x axis to year and the y axis to deaths.

<data> %>% 
  ggplot(aes(x = <column with year>, y = <column with deaths>)) +
  geom_point() # Adds a scatterplot.

Your turn

Using the data visualisation cheatsheet (Help-Cheatsheets) to figure out how to add a title layer to your plot.

Extension exercises

Questions, find the answers

In groups or alone, choose a question to answer.

Use cheatsheets, talk to eachother.

  • Prepare a presentation for those who don’t get that far describing (using summary data and visualisations) what questions can and can’t be investigated by these data. For example, are the observations restricted in time or location?
  • Use the leaflet package to plot where the witches died sized by number of deaths. Make the dots a nice blood red.
  • Ask a different table you what questions they have about these data and see if you can prepare a markdown presentation to report your results.
  • Why does Steph de Silva talk about the missingness?

Who’s up for presenting their results?

Further reading

Further reading

The rstats community is the best

I thought you might be interested in a conversation I had while preparing this workshop. The rstats community is your best resource.

Sidenote, I’d recently seen Maelle Salmon speak about the benefits of blogging to engage with the community.

This was the first time I tweeted about a blogpost; it is very reassuring to have people to bounce ideas off.

Thank you for getting your R-curious on with us!

We hope you had fun.

Special thanks

  • All the R-Ladies who contributed to this workshop
    • Di
    • Kim
    • Steph
    • Sarah K.
    • Sarah R.
  • Miles for git support
  • test specimen Dr X